30
Quantization of Neural Networks
!
!
FIGURE 2.10
Overview of the proposed Q-DETR framework. We introduce the distribution rectification
distillation method (DRD) to refine the performance of Q-DETR. From left to right, we
respectively show the detailed decoder architecture of Q-DETR and the learning framework
of Q-DETR. The Q-Backbone, Q-Encoder, and Q-Decoder denote quantized architectures,
respectively.
inaccurate object localization. Therefore, a more generic method for DETR quantization is
necessary.
To tackle the issue above, we propose an efficient low-bit quantized DETR (Q-
DETR) [257] by rectifying the query information of the quantized DETR as that of the
real-valued counterpart. Figure 2.10 provides an overview of our Q-DETR, mainly accom-
plished by a distribution rectification knowledge distillation method (DRD). We find ineffec-
tive knowledge transferring from the real-valued teacher to the quantized student primarily
because of the information gap and distortion. Therefore, we formulate our DRD as a bi-level
optimization framework established on the information bottleneck principle (IB). Generally,
it includes an inner-level optimization to maximize the self-information entropy of student
queries and an upper-level optimization to minimize the conditional information entropy
between student and teacher queries. At the inner level, we conduct a distribution alignment
for the query guided by its Gaussian-alike distribution, as shown in Fig. 2.8, leading to an
explicit state in compliance with its maximum information entropy in the forward propaga-
tion. At the upper level, we introduce a new foreground-aware query matching that filters
out low-qualified student queries for exact one-to-one query matching between student and
teacher, providing valuable knowledge gradients to push minimum conditional information
entropy in the backward propagation.
2.4.1
Quantized DETR Baseline
We first construct a baseline to study the low-bit DETR since no relevant work has been
proposed. To this end, we follow LSQ+ [13] to introduce a general framework of asymmetric
activation quantization and symmetric weight quantization:
xq =⌊clip{(x −z)
αx
, Qx
n, Qx
p}⌉, wq = ⌊clip{ w
αw , Qw
n , Qw
p }⌉,
Qa(x) = αx ◦xq + z,
Qw(x) = αw ◦wq,
(2.24)
where clip{y, r1, r2} clips the input y with value bounds r1 and r2; the ⌊y⌉rounds y to
its nearest integer; the ◦denotes the channel-wise multiplication. And Qx
n = −2a−1, Qx
p =